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We propose to approximate the conditional expectation of a spatial random variable given its 
nearest-neighbour observations by an additive function. The setting is meaningful in practice 
and requires no unilateral ordering. It is capable of catching nonlinear features in spatial data 
and exploring local dependence structures. Our approach is different from both Markov field 
methods and disjunctive kriging. The asymptotic properties of the additive estimators have been 
established for a-mixing spatial processes by extending the theory of the backfitting procedure to 
the spatial case. This facilitates the confidence intervals for the component functions, although 
the asymptotic biases have to be estimated via (wild) bootstrap. Simulation results are reported. 
Applications to real data illustrate that the improvement in describing the data over the auto- 
normal scheme is significant when nonlinearity or non-Gaussianity is pronounced. 
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1. Introduction 

Markov random fields and kriging are two important tools for investigating continuous 
spatial data. The former, including the auto-normal scheme of Bcsag [1] and the frame- 
work for exponential distribution families of Cressie ([5], Chapters 6 and 7), is for data 
on a lattice (or a graph). The latter is for irregularly positioned data; see Rivoirard [23] 
and Chiles and Delfmer ([4], Chapter 6). They both rely on parametric assumptions on 
the underlying processes. In contrast, nonparametric techniques have only found limited 
use in spatial modelling. This is largely due to the difficulties associated with the 'curse 
of dimensionality'. For example, a purely nonparametric estimation of the conditional 
mean at one location, given its (regularly spaced) four nearest-neighbour observations, 
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involves four-dimensional smoothing. Although scmiparametric and nonparametric au- 
toregressive models with additive noise have proved to be successful in modelling time 
series, such a structure has not been available for spatial processes simply because there 
exists no natural unilateral order over a plane. On the other hand, the Markov assump- 
tion is more restrictive for spatial processes; for instance, a conditional Gaussian Markov 
model essentially implies linearity (Gao et al. [10]). 

In this paper, we propose to approximate the conditional expectation of y(s), the value 
of a spatial process at location s, given its d nearest-neighbour observations, by an addi- 
tive function, and we estimate this additive approximation by adapting the backfitting 
algorithm of Mammen et al. [15] which involves up to two-dimensional smoothing only, 
regardless of the value of d. Our approach is linked to a lattice setting. Note that data on 
a regular grid and measured on a continuous scale are becoming more and more common 
with the increasing use of computer technology. We refer to Section 5 for some tentative 
ideas on extending the approach to handle irregularly spaced data. 

Our additive approximation may be viewed as a projection of the conditional expecta- 
tion into the Hilbert space spanned by additive functions. In fact the projection principle 
itself does not require a lattice framework. In the context of spatial modelling, it has 
been used in the form of disjunctive kriging (Matheron [18, 19]; Rivoirard [23]; Chiles 
and Delfmer [4], Chapter 6). Disjunctive kriging projects Y(s) into an additive space 
spanned by Y(si) for all Sj ^ s. Very often what is of interest is a functional of Y(s) 
rather than Y(s) itself. Nevertheless the projection principle still applies with Y(s) re- 
placed by f(Y(s)) for some function /. For non-regularly spaced sites it is difficult to 
use nonparametric estimation because of the lack of repeatability of the spatial pattern 
of neighbours as one moves from one site to another. Instead, disjunctive kriging intro- 
duces parametric assumptions on the bivariate distributions for all pairs (Y(si),Y(sj)), 
which then, building on appropriate isofactorial models, implies a parametric form for 
the projection of interest; sec Chiles and Delfmer ([4], Chapter 6) and Rivoirard [23]. 

Our approach is nonparametric and pragmatic; we do not impose any explicit form on 
the underlying process. Instead we seek the best additive approximation to the unknown 
conditional expectation, which itself may not be additive. This enables us to describe 
local spatial dependence structure with a potential application to texture analysis. For 
example, the nonlinear structure demonstrated in the additive approximation for the 
straw data in Section 4.2 is beyond the capacity of an auto-normal fitting and would be 
difficult to describe using disjunctive kriging. Our approach also provides a vehicle for 
testing isotropy and/or linearity; see the bootstrap test in Example 2 in Section 4.1. This 
may serve as a guide for choosing a parametric model. Of course, those advantages come 
at some cost. For example, abandoning the Markov framework implies that Markov chain 
Monte Carlo and other important analytical tools are not at our disposal. This may be 
a severe obstacle when dealing with non-stationary spatial processes. 

Another way of circumventing the curse of dimensionality is to use scmiparametric 
(partially linear) additive approximation if some components are found to be linear; this 
is explored by Gao et al. [10] with the marginal integration technique (Linton and Nielsen 
[14]; Newcy [20]; Tj0stheim and Auestad [24]). The marginal integration method is less 
efficient in practice than back-fitting when d is large, in spite of its good asymptotic 
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properties (Fan et al. [9]). Both methods require density estimation. It should be noted 
that nonparametric density estimation for spatial processes can be traced back at least 
to Diggle [7] and Diggle and Marron [8]. For more recent developments, see Carbon et 
al. [3], Hallin et al. [11, 12, 13] and Yao [26]. 

The rest of the paper is organized as follows. The methodology is laid out in Section 2. 
Asymptotic properties are stated in Section 3. The asymptotic distributions of the es- 
timators are used to construct pointwise confidence intervals for component functions 
in the additive approximation, although the asymptotic biases are estimated via wild 
bootstrap. Numerical illustrations with both simulated and real data sets are reported 
in Section 4. A brief discussion on possible extension to handling irregularly spaced data 
is presented in Section 5. All technical proofs are relegated to the Appendix. 



2. Methodology 

2.1. Least-squares additive approximation 

Suppose {F(s)} is a strictly stationary process defined on a two-dimensional lattice, that 
is, s = (u,v) € Z 2 , where Z denotes the set consisting of all integers. Let ii, . . . , i<j be 
d fixed neighbour points of (0, 0) in Z 2 , and x = (xi, . . . , Xd) T G IR d . It is of interest to 
approximate the conditional expectation 



by an additive form 



m(x) = E{F(s)|Y(s -u)=x t ,l=l,...,d} 



m a + mi(xi) H h m d (x d )- 



(2.1) 



(2.2) 



We seek the optimal approximation in a least-squares sense; see (2.4) below. To make 
the terms in (2.2) identifiable, we require J mj(y)fo(y) dy = 0, j = 1, . . . ,d, where /o(-) 
denotes the marginal density function of Y(s). If m(-) itself is of the form (2.2), it is easy 
to see that 

m j (x j ) = E{Y(s)\Y(s-i j )=x j }-m - ^ E[me{Y(s-i e )}\Y(s-i j )=x j ]. (2.3) 

\<i<d 

In general, we obtain an optimum additive approximation by minimizing 



E 



Y(s) m e {Y(s - i c )} 



or equivalently, 



E 



m{X(s)} -m m e {Y(s - i c )} 



i=i 



(2.4) 



(2.5) 



450 Z. Lu et al. 

over m + X^=i m <K') e -^add, where X(s) = {F(s- ii), . . . ,Y(s - i d )} T and 



f=l 



mo£t, mt(y)fo{y) dy = for 1 < £ < d 



(2.6) 



2.2. Estimators 

We now spell out how to estimate the best additive approximation for the conditional 
expectation (2.1). To simplify notation, we assume that observations {(Y(s^), X(s£)), 1 < 
i < N} are available. Furthermore, we assume that those data are taken from a rectangle 
in Z 2 , for example, 



{si, . ..,s N } = {(u,v):u=l, ...,Ni,v = l,. ..,N 2 }, 



(2.7) 



where N1N2 = N. Other sampling schemes are possible; see the remark at the end of 
Section 3.1 below. 

In practice we replace (2.5) by 



J\ m(x) - m - ^2m e (x e ) > /(x)dx, 



(2.8) 



where x = (x\, . . . , Xd) , and 

iY 



N d 



m(x) 



1=1 



1 N 

( x ) = ^E y ( s ^{ x - x ^)}- 

^=1 



(2.9) 



In the above expression, Kh(x) = h 1 K(x/h) 1 K( ) is a density function on K, and h > 
is the bandwidth. We also define 

1 N 1 N 

iV e=i N JA x i) 1=1 

1 w 

fjk(xj,x k ) = — y]Kh{xj - y(s £ - ij)}K h {xk - Y(s e - i fe )}. 



Note that fj and fjk are the marginal density functions from the joint density /. 
Let {fhi} S ^"add be a minimizer of (2.8), where 



•Fadd = I mo + ^2 mt(xe) 
{ 1=1 



m 6 R, / mt(y)fi(y) dy = for 1 < I < d 
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Then the least-squares property implies that 



m(x) — too 



d _ 1 

^m^(:rf) >{fhj(xj) -mj(xj)}f(x)dx = 
e=i J 



for any rrij(-), j = 0, 1, . . . ,d. (We write mo(-) = mo-) 
This leads to 

in a = /"jm(x)-^m^)j/(x)dx = / m(x)/(x) dx = 1 ^ y(s £ ) = F, (2.10) 
J I J ^ i=\ 

and for j = 1, . . . , d, 

m j (x j )=m j (x j )-m -y] [ fhi{xi)^J^^-dx e . (2.11) 

n,:J tAXn) 



We always follow the convention that x/y equals if y = 0. It is easy to see that (2.11) 
intimately resembles (2.3). It also naturally leads to the following backfitting algorithm: 
in the jth step of the rth iteration cycle we define 

mW ( Xj )=m j (x j )-Y-J2 f "tf (x t ) fj i Xj,Xe) dx e 



3 

£<j' 



fj( x j) 



E~(r-i)/ -, fje( x j' x e) , 
/ m\ '(xe) ~ dxi- (2.12) 



fj( x i) 



We choose Nadaraya-Watson (i.e. local constant) estimation to keep our exposition as 
simple as possible. For general discussion of smoothing backfitting algorithms, including 
the one based on more efficient local linear estimation, we refer to Mammcn et al. [15] and 
Nielsen and Sperlich [21]. More recently, Mammen and Park [17] showed that if in (2.12) 
rhj is replaced by a marginal local linear estimator, and fji/fj is replaced by a more 
sophisticated functional constructed using a convolution kernel, the resulting backfitting 
method is asymptotically as efficient as the one based on local linear estimation. 

Finally, we remark that the minimizcr of (2.8) is the same as the minimizer of 

4 /E( y ( s ;)- m °-E m ^)| A^{x-X(s,)}dx. 

J j=l I 1=1 ) 

This lends support to the use of a simple leave-one-out cross-validation bandwidth esti- 
mator: 

d "I 2 
Y(8 j )-fh - j -'^2ffie- j {Y(8 j -ie)} , (2.13) 



N 



h = arg rain 



j=i L 
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where fh^-j is the backfitting estimator of mi without the jth observation (Y(sj), X(sj)). 
Nielsen and Sperlich [21] proposed some modifications to make this bandwidth selector 
computationally more efficient. Three other data-driven bandwidth selectors for additive 
modelling based on backfitting were proposed in Mammen and Park [16]. 

3. Asymptotic properties 
3.1. Regularity conditions 

In order to present asymptotic results, we define the a-mixing coefficients for spa- 
tial processes first. For any A C Z 2 , let J 7 (A) denote the a- algebra generated by 
{(X(s), Y(s)), se A}. We write \A\ for the number of elements in A. For any A, B C Z 2 , 
define 

a(A,B)= sup \P(UV)-P(U)P(V)\, 
ueJ r {A)y£r{B) 

and d(A, B) = min{||si — S2 1| | Si G A, S2 £ B}, where || • || denotes the Euclidean norm. We 
may define an a-mixing coefficient for the process {(X(s), Y(s))} as 

a(k;i,j) = sup {a{A,B)\\A\<i,\B\<j,d(A,B)>k}, (3.1) 

A.BcZ 2 

where k are positive integers and i,j may be infinite. For further discussions on the 
mixing for spatial processes, we refer to Section 1.3.1 of Doukhan [6] and Section 2.1 of 
Yao [26] and references within. 

Let C denote some positive generic constant which may be different at different places. 
The following regularity conditions are imposed. 

(CI) The density functions / of Y(s) and fjk of {Y(s — ij),Y(s — ifc)} have contin- 
uous second derivatives, and are bounded from above by a constant indepen- 
dent of ij — ifc. The conditional expectation OTy(-) has continuous first deriva- 
tive. The density functions of X(s) conditional on Y(s), and {X(i),X(s)} con- 
ditional on {Y(i),F(s)} are bounded from above. Furthermore, for some A > 
and N-^^h-^^O, 

E{exp(A|y(s)|)} <oo. (3.2) 

(C2) The kernel function K(-) is symmetric, compactly supported and Lipschitz con- 
tinuous. 

(C3) As N = N1N2 — > 00, N1/N2 converges to a positive and finite constant, the band- 
width h — > and 

N P-5 h P+5(i ogN yW+?) _ (3.3) 
where (3 > 5 is a constant. 
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(C4) a{k; k', oo) < Ck- p for any k and k' = 0(k 2 ). Furthermore, J2kLi j, I) < 

oo for some j + t < 4, a(fc; 1, oo) = o(fc- d ) and J2T=i fc d_1 a(fc; 1, 1)( 5 - 2 )/ 5 < oo 
for some 5 > 2. 

Conditions (C1)-(C2) are standard in kernel estimation. Both the assumption of the ex- 
istence of the moment generating function of |y(s)| and (3.3), which imply the optimum 
uniform convergence rates (A.l) and (A. 2), can be relaxed at the cost of lengthy argu- 
ments. On the other hand, for causal and invertible (under the half-plane order) spatial 
ARMA processes satisfying some mild conditions, a(/c;fc',oo) decays at an exponential 
rate as k — > oo (Remark 2.1 of Yao [26]). Therefore, condition (C4) is fulfilled. For op- 
timum bandwidth h = OiN' 1 / 5 ), (3.3) is fulfilled for (3 > 7.5. Condition (C3) requires 
two sides of the sampling rectangle to increase to infinity. In fact this assumption can be 
relaxed. For example, our theoretical results will still hold if the observations were taken 
over a connected region in Z 2 , and the ratio of the minimal side length of the squares 
containing the region to the maximal side length of the squares contained in the region 
converges to a constant in the interval (0,oo). For a general discussion on the condition 
of sampling sets, we refer to Perera [22]. 

3.2. Convergence of backfitting 

Backfitting techniques have proved effective in handling complex model fitting. However, 
its convergence is typically difficult to handle. We apply Theorem 1 of Mammcn et al. 
[15] to show that a modified version of backfitting (2.12) converges. The modification is 
in line with Section 5 of Mammen et al. [15] in order to fulfil certain regularity conditions 
which simplify technical arguments substantially. 

Let A C M. d be a compact set on which the density function of X(s) is positive. 

Define 



Then p(-) is a density function on R d . As an illustration, Mammcn et al. [15] chose 
A = [0, l] d . Since the components of X(s) are dependent, sets of cylinder type are not 
always relevant. (For example, the support of (X t ,X t -i) for linear AR(1) time series 
would be around a line segment with non-zero slope.) Denote by pj(xj) and pjk(xj,Xk) 
respectively the jth univariate and the (j, k)th bivariate marginal density functions of 
p(x). We require the following consistency condition on the set A. 

(C5) There exist compact sets Aj C {f(xj) > 0} and Ajk C {fjk(xj,Xk) > 0} such that 
for 1 < j, k < d and j ^ k, 



P( x ) =Pi,....d(x) 



/i,...,d(x)J(xe>l) 
P{X(s) G A} 



f{x j )I(x j £A j ) 
P{Y(s) g A,} 



Pjk(Xj,X k ) 



P[{r( s -i i ),F(s-i fc )}eA jfc ]- 



fjk(xj,x k )I{(xj,x k ) 6 A jk } 



Due to stationarity, a relevant set A often exhibits certain symmetries. For example, we 
may observe Aj = Aj and Pi{-) =Pj(') for all i and j. 
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Differently from Section 5 of Mammen et al. [15], we define estimators for pj and pjk 
as follows: 

(3.4) 

a^-.so = j { ( 8j> x>) e ^> ^§ia^ : - z ?»» (3 . s) 

Efci J [{ y (s< " ijO.^fa - ife)} € A,-*] 

Obviously and pjfc are consistent estimators for pj and p,/,,, respectively. Note that 
K(-) is compactly supported. For Xj G Aj, Kh{xj — Y(sg — ij)} may be non-zero for 
sufficiently large N only if Y(si — ij) <G A'j, where Aj is a compact set sandwiched be- 
tween Aj and {f(xj) > 0}. Therefore, similarly to Mammen et al. [15], we effectively 
only use the observations in a compact set when estimating pj . It is possible now that 
fpjk(xj,Xk)dxk j^Pj(xj). Similarly to Mammen et al. [15], we modify the backfitting 
procedure (2.11) and (2.12) accordingly: 

(,, = ^,,) _ ^ _ g / - 

+e fsr't.i^- 1 !^!^, (3.7) 

where moj = / fhj{x)pj{x) dx/ J Pj(y) dy. Note that, for Xj G Aj, fhj(xj) defined in (2.9) 
may be written as 

**,(*,) = l(x 3 e Aj) ^nse)K h {xj-Y( Si -ij) } _ 
Pj( x j)J2e=i I{Y{ Si - ij) e A,-} 

The following theorem indicates that this backfitting procedure converges exponentially 
fast. 

Theorem 1. Under conditions (C1)-(C5), with probability tending to 1, there exists a 
unique solution {fhj} of (3.6), and further, for r > 1 and x = (x\, . . . , Xd) T being an 
inner point of A, 

Yl J {™ { l\ x j)-™j( x j)} 2 P3( x j) dx j ^ C P 2r (^ 1 + Ylj {™?\ x i)} 2 Pi( x i) dx ^> 

where p € (0, 1), C > are constants, fhj(Xj) is defined by (3.7), and {fhj(xj)} are the 
initial values of the backfitting algorithm. 
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Henceforth, we assume that x = (x\, . . . , Xd) is an inner point of A. Let e(s) = Y(s) — 
m{X(s)}, and m°(x) = m,Q + J2j=i m j( x j) k° the minimizer of (2.5) over 

■^add = |m(x) = ro + y^jnjjxj) m G M, ^ m j (y)p j (y) dy = for 1 < j < dj. 

Then {mg,TO°(-), . . . ,m^(-)} is uniquely determined by the least-squares property. Define 



/3(x) =^|rh?(x i )^-logjj(x) + ^mjfejj | u 2 A»du, (3.8) 



Let /3q + 53iLi Pj{ x j) be the minimizer of 



|j/3(x)-/? -X>fe)} Mx)dx 



(3.9) 



0Ver Kdd' 



Theorem 2. Let conditions (C1)-(C5) hold, and h^CN^ 1 / 5 . Then 

I fhi{x 1 )-ml{x-y)-h 2 l3 1 {x 1 )\ 
VNh : -^iV(0,S(x)), 

\m d (a:d) -m° d (x d ) - h 2 f3 d (x d ) J 

where S(x) is a diagonal matrix with 

o-jixj) 2 =var[F(s) - m {X(s)}|F(s - L,-) = xj] J K{u) 2 du/fjixj) (3.10) 
as its jth main diagonal element. 

Remark 1. (i) Although we do not assume the true conditional expectation m(x) de- 
fined in (2.1) to be of additive form, the estimators do not have extra biases due to the 
discrepancy between m(x) and its best additive approximation m°(x). This is due to the 
'orthogonality' 

/"{m(x)-m°(x)}p(x)JJda; fe = 0, l<j<d, (3.11) 
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which is guaranteed by the least-squares property that m°(-) is the minimizer of (2.5) 
over J"add- On the other hand, the variance in (3.10) is equal to 

(var[y(s) - m{X(s)}|F(s - L,-) = Xj ] 

+ vax[m{X(s)}-m°{X(s)}\Y(s-i j )=x j }) J K{uf du/pj(xj). 

The second term in the above expression disappears when to(x) itself is an additive 
function. 

(ii) The proof of Theorem 2 entails that 

mj(xj) - Tn%{xj) - h 2 (3 3 (xj) + o p (h 2 ) 
1 N 

= J^-)jy^ ~ m°ms e )}}K h { X] - Y(s e - i,)}. (3.12) 

By Theorem 2, an approximate 95% pointwise confidence interval for m°(xj) would 
be of the form rhj( X j) — h 2 f3j( X j) ± 1.96a j( X j)/Vnh. However, the quantities fijixj) and 
(jj{xj) are unknown in practice. Furthermore, it is rather difficult to estimate /%(•)> 
see (3.9) and (3.8). We now outline a heuristic method based on wild bootstrapping to 
estimate the bias f3j(xj) and the variance <jj(xj) 2 . 

Let {e(s)} be independent and identically distributed random variables with mean 
and variance 1. Draw a bootstrap sample F(si)*, . . . , F(sjv)* from the model 

Y(s)* = m{X(s)} + e(s)[Y(s) - m{X(s)}], (3.13) 

where m(x) = too + X)j=i ™j( x j)- Then E*{y(s)*|X(s) = x} = to(x). Let {m*} be the 
estimators obtained in the same way as {fhj} but with sample {K(sj), X(sj)} replaced 
by {Y(sj)*, X(s :) )}. It may be shown that m*( X j) — rhj(xj) shares the same asymptotic 
distribution as fhj(xj) — m°( X j); see Theorem 2 above. Hence, we may use the sample 
mean and the sample variance of m*(xj) — fhj(xj) obtained in a repeated bootstrap 
sampling (with a large number of replications) as the estimates for the mean and the 
variance of rhj( X j) — m°(xj). Combining with Theorem 2, this leads to an approximate 
95% pointwise confidence interval for rrij(xj) (1 < j < d): 

2m J (x J ) - fh*( Xj ) ± 1.96 S ;(x,), (3.14) 

where rh*(xj) is the sample mean of m*(xj) in the repeated bootstrap sampling, and 
s*(xj) is the sample standard deviation of m*( X j) — rhj(xj). 

Remark 2. Note that the conditional expectation E*{F(s)*|X(s) = x} = m(x) is an 
additive function, while E{F(s)|X(s) = x} = m(x) may not be. This makes it difficult to 
construct confidence intervals directly based on bootstrapping. The confidence interval 
(3.14) is based on the asymptotic normality of the estimator fhj(xj). Bootstrapping was 
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merely employed to estimate the unknown asymptotic bias f3j(xj) and variance <Tj(xj) 2 , 
which relied on the fact that [3j(xj) is determined by the best additive approximation 
m°(x) of m(x) instead of m(x) itself; see (3.9) and (3.8). On the other hand, it may be 
shown that 

m*(xj) - fhj{xj) - tffijixj) + o p (h 2 ) 

1 N 

= N -.( t E^)* - m{X(B t )}]K h { Xj - Y(s e - 
Pj\ j) e=1 

1 N 

= J^T^A E e ( s )[ F ( s *) - fh{X( Se )}]K h { Xj - Y(s e - i,)} 
1 N 

= W^T~)T, £ ^ Y ^ ~ m {y.{ Sl )}]K h { Xj - Y(a t - i,)}{l + o p (l)}; 

see (3.12). This would ensure that the bootstrap estimator admits the same asymptotic 
variance. 



4. Numerical properties 
4.1. Simulation 

In this section, we illustrate the proposed backfitting procedure with two examples: a 
unilateral additive model under the half-plane order (Whittle [25]), and a (bilateral) auto- 
normal model (Besag [1]). The bandwidth selection procedure (2.13) is implemented in 
Example 1. In Example 2 we apply a parametric bootstrap test to test the null hypothesis 
of an auto-normal model. In the numerical examples let K be a Gaussian kernel. 

Example 1 Unilateral additive model. Consider the model 

Y(u,v) =sin{y(w- l,v)} +cos{Y(u,v - l)} + e(u,v), (4.1) 

where e(u,v) are independent N(0, 1) random variables. Hence 

E{Y(u,v)\Y(u-l,v),Y(u,v-l),Y(u-l,v-l)} 

= m + mi {Y(u - 1, v)} + m 2 {Y{u, v - 1)} + m 3 {Y(u -l,v- 1)} (4.2) 

with mo = E{Y(u, w)}, m^{-) = 0, and 

m\{x) = sin(x) — E[sin{F(u, v)}], m 2 (x) = cos(a;) — E[cos{y(u, v)}]. 

We drew 100 samples from model (4.1) on the rectangle {{u,v) :l<u<24,l<v< 28}. 
For each sample we estimated the component functions rrij(-) for j = 1,2,3 with the 
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bandwidths h chosen automatically by the leave-one-out procedure (2.13). The boxplots 
of the estimated curves over 13 regular grid points are presented in Figure 1. While the 
estimation is accurate overall, the variation of the estimation is larger at the both ends 
due to boundary effects. The mean and variance of the selected bandwidths over 100 
replications are 0.416 and 0.064. respectively. 

Example 2 Besag's first-order auto-normal scheme. Let the conditional distribution of 
Y(u,v) given {Y(i,j), ^ (u,v)} be normal with mean 



and variance vax{Y(u,v)\Y(i,j), ^ (u,v)} = 1, where Q\ = 0.2 and 82 = 0.25. Now 
the {Y(u,v)} are jointly normal with common mean 0, and 

E{Y(u,v)\Y(u -l,v) = X!,Y(u,v- 1) = x 2 ,Y(u + l,v) =xs,Y(u,v + 1) = £4} 
= mi(xi) + m 2 (x 2 ) + m 3 (x 3 ) + m 4 (x 4 ) 

with m\{x) = m 3 (x) = 9ix, 7712(2) = 777.4(2;) = 82X] see Besag [1]. 

We conducted a simulation with 500 replications. For each sample taken on the rect- 
angle {(u, v) : 1 < u 1 v < 20}, we applied the backfitting algorithm to estimate rrij{-). The 
boxplots of the estimated curves over 11 grid points are presented in Figure 2. To speed 
up the computation, we used a fixed bandwidth h = 0.4. The linearity of rrij(-) is evi- 
dent in Figure 2. In fact a simple linear least-squares fitting for the estimated values of 
m.j{-) led to the estimated slopes 0.2013, 0.2425, 0.2049 and 0.2552, for j = 1,2,3 and 4, 
respectively, very close to the true values. 

We also applied a parametric bootstrap method to test the null hypothesis of the auto- 
normal scheme (4.3): the bootstrap samples were generated from the auto-normal process 
with #1 and 62 in (4.3) estimated by the coding method (Besag [1]). Note that under the 
auto-normal scheme, E[e(s)/{X(s) £ B}] = for any measurable B C R 4 , where er(s) is 
defined as the difference between Y{s) and the right-hand side of (4.3), and X(s) consists 
of the four nearest neighbourhoods. This leads to the test statistic 



where X(s 3 ) < X(sfe) is defined under the unilateral half-plane order (Whittle [25]), and 



e( Sj ) = Y{ Sj ) - e 1 {Y(u j - 1, Vj ) + Y(uj + 1, Vj)} - 6 2 {Y( Uj , Vj - 1) + Y( Uj , Vj + 1)}. 



Among the 500 replications, the proportions rejecting the linear null hypothesis are 
10.8% at the level a = 10%, and 4.4% at the level a = 5%. To further assess the accuracy 
of the bootstrap approximation, we took the upper 10th and 5th percentiles for the 



E{Y(u,v)\Y(i,j),(i,j)^(u,v)} 

= 0i{r(7i - 1, v) + Y(u + 1, v)} + 8 2 {Y(u, v - 1) + Y(u, v + 1)} 



(4.3) 




(4.4) 
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Figure 1. Boxplots of the estimators for (a) mi(i) = sin(x) — E[sin{Y(ti, v)}], (b) m,2(x) 
cos(x) — 'E[cos{Y (u,v)}], and (c) ms(x) = in Example 1. 



empirical distribution of the 500 simulated values T as the true critical values t a for 
the test at the level a = 10% and 5%, where Figure 3 displays the boxplots of the 
relative frequencies of the event T* > t a in the 200 bootstrap replications. This shows 
that most frequencies arc clustered around a for both a ~ 10% and 5%, indicating that 
the bootstrap approximation to the null distribution of T is reasonably accurate. 



4.2. A real data example 



Figure 4 displays a magnetic resonance (MR) image of two test tubes filled with plastic 
straws of two different diameters embedded in gadolinium-doped agarose gel with tissue 
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Figure 2. Boxplots of the estimators for (a) mi(i) = 0.2a;, (b) 7712(2;) = 0.25a;, (c) 7713(3;) = 0.2x, 
and (d) 7714(2;) = 0.25s in Example 2. 



equivalent relaxation times. The straws test object was imaged with a Tl-weighted SE 
pulse sequence on a Siemens Vision 1.5 T MR scanner using slice thickness of 4 mm and 
in-plane resolution of 0.6 mm x 0.6 mm. In the MR images of Figure 4, the more white 
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(a) (b) 




Figure 3. Boxplots of the relative frequencies of the event {T* > t a } for (a) a = 0.1, and (b) 
a — 0.05 in Example 2. 



in) (h) 




Figure 4. Modelling the straw data: MR images from the straws test object (a) depicting a lon- 
gitudinal section with indication of the trans-axial slices, and (b) showing the upper trans-axial 
slice. 

a voxel is, the stronger the signal intensity. The black background region with very low 
intensity is air surrounding the two cylinders. 

For our analysis, we chose two stationary-like subsets of image Figure 4(b), each of 
size 61x61. The subset images are plotted respectively in Figures 5(a)(i) and 6(a)(i). For 
each subset, we approximated the conditional expectation 



E{Y{u,v)\Y(u-1,v) = x-l,Y(u,v-1)=x 2 ,Y(u + 1,v)=X3,Y(u,v + 1)=X4} (4.5) 
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by an additive form 

mo + m\(x\) + m 2 (x 2 ) + m 3 (i 3 ) + 7714(0:4), 

where m\(xi), 7712(2:2), 7713(0:3) and 7714(2:4) represent the contributions from the nearest 
neighbourhood in the north, west, south and east direction, respectively. For comparison 
purposes, we also fitted the data using Besag's ([1]) first-order auto- normal scheme, 
assuming the conditional variances over different locations were the same. This leads to 
the assumption that the conditional expectation (4.5) is of the form 

a + pi (x\ — a) + (3 2 (x2 - a) + fli (x 3 - a) + f3 2 (.14 - a). (4.6) 

The coefficients (3j and a were estimated using Besag's coding method. The estimated 
additive functions fhj (x) , together with the fitted straight lines (3j (x — a), are plotted in 
Figure 5 for the large-diameter straws and in Figure 6 for the small-diameter straws. As 
in Example 2 above, we also applied the parametric bootstrap based on statistic (4.4) for 
testing the auto-normal null hypothesis for those two subsets (with 200 bootstrap replica- 
tions), leading to p- values less than 0.005. This indicates that the first order auto-normal 
scheme is inadequate for both the data sets. The histograms presented in Figures 5(a) (ii) 
and 6(a) (ii) indicate bimodal marginal density functions; the lower-intensity mode cor- 
responds to voxels at the boundary between straws, and the higher- intensity mode to 
voxels in the interior of the straws. The pointwise confidence intervals for mj(-) were 
obtained using the standard normal e(s) in (3.13) with 100 wild bootstrap replications; 
see (3.14). 

The plots of the rrij(-) functions as a rough approximation suggest isotropy, which is 
natural given the set-up of the straws. Both the plots and bootstrap test point to nonlin- 
earity. Note that the bends at the ends of the curves cannot be attributed to boundary 
effects, as substantial number of voxels fall in the end regions; see the histograms in 
Figures 5 and 6. 

A possible explanation for the bends is as follows: for both the large- and small- 
diameter straws there is a positive correlation among intensity values in the middle (the 
rrij(-) curve has a positive slope). These intensity values correspond to voxels in the 
center of the straws, for which it is seen (from Figures 5(a) (i) and 6(a) (i)) that there is 
a positive spatial autocorrelation. For the small-diameter straws there is local negative 
correlation at both ends of the curves. Looking at the image (Figure 6(a) (i)) it is seen 
that the lowest-intensity values (darkest) voxels are at the boundaries as expected. But it 
is also seen that there are voxels of very high intensity (very white) close to the boundary. 
Similarly, the voxels of highest intensity are often found close to the boundary and have 
low-intensity voxels in their neighbourhood, resulting in the local negative correlation 
for extreme intensity values. For large-diameter straws with low-intensity voxels at the 
boundary, the same pattern occurs but not to the same extent; see Figure 5(a)(i). The 
picture for voxels of high intensity is less clear, as some of these are located close to 
the boundary surrounded by low-intensity voxels, others close to the centre with high- 
intensity neighbours. There is no clear dependence pattern for high values, which is 
echoed by the flatness of the rrij(-) plots for high intensities. 
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(a) 
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Figure 5. Modelling subset I of straw data, (a) (i) A subregion of interest (61 x 61 window, of 
which a 31 x 31 detail is shown) from the bundle of large diameter straws in Figure 4(b); (ii) 
the corresponding signal intensity histogram. Pixel signal intensity in a.u. (12-bit range), (b) 
Additive estimators (solid lines), the boundaries of pointwise confidence intervals (dotted lines), 
and auto-normal scheme estimators (dashed lines) for (i) mi(x), (ii) 7712(2;), (iii) 7773(2;) and (iv) 
7714(2:) and mo = 2278.615. 
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Figure 6. Modelling subset II of straw data, (a) (i) A subregion of interest (61 x 61 window, 
of which a 31 x 31 detail is shown) from the bundle of small diameter straws in Figure 4(b); 
(ii) the corresponding signal intensity histogram. Pixel signal intensity in a.u. (12-bit range), 
(b) Additive estimators (solid lines), boundaries of pointwise confidence intervals (dotted lines) 
and auto-normal scheme estimators (dashed lines) for (i) mi(x), (ii) 7722(2;), (iii) 7723(0:) and (iv) 
7724(2:) and mo = 2089.465. 
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Overall the nonparametric additive approximations for conditional means describe the 
local correlation structure of the straws quite well, whereas the auto-normal models fail 
to do so since they only reproduce the dominating positive spatial autocorrelation in the 
interior of the straws. 

5. Discussion 

Observations taken on irregular grids often occur in practical spatial problems. We outline 
below some tentative ideas to extend the method proposed in this paper to handle irreg- 
ularly spaced data. Our basic assumption is that the observations {Y(sj),j = 1, . . ., A^} 
(after dctrcnding appropriately) are taken over an irregular grid from a strictly stationary 
process V(s) with index s varying continuously on IR 2 . Our goal is to estimate the best 
additive approximation, in the sense of (2.4) and (2.5), for the conditional expectation 
at a fixed location given its d neighbourhood observations, for small d such as 3 or 4. 
Without loss of generality, we may assume that the location is at = (0,0) T , and the d 
neighbourhood locations are ii, . . .,i<j. Put X(0) = {Y(ii), . . . , Y(id)} T ■ Our task is now 
to estimate the best additive approximation for 

m(x) = E{F(0)|X(0) =x}. 

First, for each location s& at which we have an observation Y(sfc), we define its d 
neighbourhoods selected among the other N — 1 observations by minimizing 

d 

£l|i;-(s J -s fc )||, 
J=l 

where the minimization is taken over s J G {si, . . . ,sjv}, s? ^ Sfc, and s 1 ,...^ are 
all different from each other. Let (ski, ■ ■ ■ ,Skd) be the minimizcr. Then X(sfc) = 
{Y(sfci), . . . , Y(skd)} T are the d neighbourhood observations of Y(sk) as far as our task 
is concerned. Put 

d 

X k = ^\\i 3 - (sjk-Sk)W, 

3 = 1 

which measures the discrepancy between the pattern of (sifc, . . . , Sdk) in relation to s k 
and that of (ix,...,i<j) in relation to 0. It is easy to see that X k = if and only if 
(sfc,Sife, . . . ,Sdk) is merely a shift (without rotating) of (0, ii, . . . , id) in M 2 . The larger 
Afc is, the larger the discrepancy is between the two patterns. As far as the estimation 
for m(x) is concerned, we should not treat all {Y(sfc), X(sfc)} equally, as we did for 
regularly spaced data. Instead we give more weight to the observations {Y(sk), X(sfc)} 
with smaller values of By taking this into account, an argument similar to that in 
Section 2.2 leads to the backfitting estimation (2.12) in which, however, now 

N N 

fj( x j) = w k K h{x 3 - Y(s jk )}, fhj(xj) = j— WkY(s k )K h {xj - Y(s jk )}, 
fe=i Jj\ x o) k=l 
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N 

fji(x 3 ,xi) = y^WkKhjxj - Y(s jk )}Kh{x£ - Y(stk)}, 
fe=i 

where the weight function Wk = W{\k/V)/YLj=x W(\j/b), W(-) is a kernel function, and 
b > is a bandwidth. The associated issues on inference, computation and asymptotic 
properties are subject to further investigation. An alternative that could be explored is 
to replace ii, . . . , i<j by an average set of d neighbourhood points i' l7 . . . , where i'j = 

SfcLi( s fcj — s fc)' J = 1 , - ■ • , rf, in which s' fcj is the jth nearest neighbour of s&. 
Finally, we note that for observations taken irregularly over space and regularly over 
time, the method proposed in Section 2.2 may be applied directly if we only use the data 
taken at the fixed location but over different times in the estimation. Technically this 
reduces to a problem of multivariate time series modelling. However, there is an added 
advantage: the inference does not rely on the assumption of the stationarity over space. 



Appendix 

A.l. Proof of Theorem 1 

We only need to justify conditions (A1)-(A3) in Mammen et al. [15]. The required result 
follows from their Theorem 1 immediately. 
Condition (Al) requires that for j ^ k, 

P%(xj,x k ) f p 2 jk (x 3 ,x k ) 

dXj dXk = / — -. — : — -, — r dXj dXk < CO, 



Pk{x k )Pj(xj) J a.. Pk(x k )pj(Xj) 

which is guaranteed by (C5). By Theorem 2 of Yao [26], 

{{ lo N \ ^ N 

Similarly, we may show that 

Pjk(xj,x k ) -pjk(xj,x k )\ =O p \ h 2 + ( '"f,^ ] 

{xj ,x k )<EA jk 

Furthermore, it is easy to see from Theorem 3 below that 



1 /2 

sup \p jk (xj ,x k )- Pjk (xj , x k ) I = O p lh 2 + ( J I . (A.2) 

,Xk)£Aj k L \ / J 



sup \fhj(xj) — ~Efhj(xj)\ — O p (l), sup \Efhj(xj)\<C. (A. 3) 

Note that fjk(xj,Xk) = fkj(xk,Xj). Condition (C5) implies that (xj,Xk) S A/fc if and only 
if (xk,Xj) £ Akj for any j ^ k. This, together with (A.1)-(A.3), entails conditions (A2) 
and (A3) of Mammen et al. [15]. 
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Let e(s) = Y(s) - m{X(s)}, and 



N 



x $}e(s*) + m{X(s,)} - m°{^{s,)}]K h {x 3 - Y(s e - 



(A.4) 



fh b Ax 3 ) 



I(x 3 G A,-) 



iV 



^ ? 7 l {X(s £ )}^{x J -y(s £ -i,)}. 



Then fh 3 (x 3 ) = fhj(xj) + fh b (xj). Let rrij(xj) and m b (x 3 ) be defined by (3.6) with fh 3 (x 3 ) 
replaced by m"(xj) and m b {x 3 ), respectively. We first introduce two lemmas. 

Lemma 1. Under conditions (C1)-(C5) ; for any j =/= k. 



and 



E 



p k (x k )dx k 



o p (h 2 ) 



— , . — mAxj)&Xj 
Pk{x k ) 3 



o p (h 4 ). 



Lemma 2. Under conditions (C1)-(C5), J m°(xj)pj(xj) dxj = o p (h 2 ), and 



sup Ifhjixj) - %{xj)\ = o p (h ), 

x k £A k 



\rh b Axj) - /',(•'•.,) 2 Pj{x 3 )dxj = o p (h 4 ). 



Based on (3.11), Lemma 1 may be proved in the same manner as (A6) in Appendix 
A of Mammen et al. [15]. The proof of Lemma 2 is similar to the proofs of (114), (112) 
and (113) in Mammen et al. [15]. 

We now sketch the proof of Theorem 2. Note that x = (xi, . . . ,Xd) T is an inner point 
of A. Lemma 1 implies condition (A6) of Mammen et al. [15] with An = h 2 . By Theorem 3 
below, condition (A9) of Mammen et al. [15] also holds. By Theorem 3 of Mammen et 
al. [15], condition (A7) in their paper also holds. It may be proved that 



m a Ax)p 3 {x) dx I I p 3 (y) dy = O^N- 1 ' 2 ) = o p (h 2 ) 



Now it follows from Theorems 2 and 3 of Mammen et al. [15] that 



TOj-fo) = m a Ax 3 ) + h 2 Pj{xj) + o p (h 2 ). 



(A.5) 
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Note that 



E[e(si)K h { Xj - Y(s e - i 3 -)}] - E[E{e{si)\X{s e )}K h { Xj - Y(s, - i,)}] 







and 



E[{m{X(s e )} - m°{X(st)}}K h {z - Y(s e - i 3 -)}] 

= / /v,j(z - Xj) \ {m(x) - m°(x)}p(x) dx fc dx 3 = 0. 
7 fc#J J 



(A.6) 



The last equality in the above expression follows from (3.11). Now the required central 
limit theorem follows from (A. 5), (A. 4), (A.6) and the theorem in Bolthauscn [2]. 

A. 3. Uniform convergence rates for regression estimation 

First, we introduce some notation. Let {(Y(sj), X(sj)), 1 < j < N} be observations from 
a two-dimensional strictly stationary spatial process with {si, . . . ,sjv} given as in (2.7). 
Let /(•) be the density function of X(s) and m(x) = E{y(s)|A'(s) = x}. We define the 
Nadaraya- Watson estimator fh(x) —r{x)/f(x) with 



We introduce some regularity conditions. 

(CI') m(-) has continuous first derivative, /(•) has continuous second derivative, and 
the joint density function of X(s) and X(s + i) is bounded by a constant inde- 
pendent of i. Furthermore, (3.2) holds. 

(C4') a(k;k',j) < Ck~® for any k,j and k' = 0(k 2 ), where a is defined as in (3.1) 
with X(s) replaced by X(s). 

Theorem 3. Let A be a compact set contained in the support of /(■). Under conditions 




(CI'), (C2), (C3) and (C4'), 




(A.7) 



and 



{ 



( 



log 



) 



1/2 



} 



sup \ fh(x) — Em(x)\ = O. 



+ h 



2 



(A.8) 



p 
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Proof. We first prove (A. 7). Let r(x) = m(x)f(x). For any > and e > 0, 

P< sup aN\fh{x) — m{x)\ > e 

- P \ „ N ~, . ( sup l^fo) ~ r ( x )\ + maxm(z) sup |/(x) - f(x)\ ) >e 
{mf yeA f(y)\xeA Z ^ A xeA 

< P\ Cxa N sup \?{x) - r{x)\ + C 2 a N sup \f(x) - f(x)\ > e 

y xGA xEA 



+ Pi •m£ \f{x)-f(x)\>r 

[x<EA 

where C\, C2,r > are constants. It follows from Theorem 2 of Yao [26] that the second 
term of the right-hand side of the above expression may be arbitrarily small for all 
sufficiently large N . Then (A. 7) follows from Theorem 2 of Yao [26] and 

1 /2 

su V \?(x)-r{x)\=O p \( l ^f) + h 2 ), (A.9) 



xeA 



which will now be established. 

Partition A into L subintervals {Ij} of equal length. Let Xj be the centre of Ij. Since 

N , N 



1 1 c 

\r(x)-r(x')\<-^\Y( Sj )\\K h {X( Sj )- x }-K h {X( Sj )-x'}\<-^ 

3=1 3=1 1 

we have that |Ef(x) — Ef(x')| < C7i _1 |o; — x'\. Hence 



sup \r(x) - Er(x)\ < max \r{xj) - Er(xj)\ + O p [ — ). (A. 10) 

xeA i<j<L yLh; 



For large M > 0, define 

JV 



f i( x ) = Jj T,Y( S] )I{\Y( Sj )\ < M}K h {X( Sj ) - x], 

3 = 1 
1 W 

^ = l^E y ^) 7 {l y ( s 3)l > M}^{X( Si ) - a;} 



3 = 1 

and 



n(x) = E[y( Bj )/{|l r (s i )| < M}\X( Sj ) = x], r 2 (x) = E[Y( Sj )I{\Y( Sj )\ > M}\X( Sj ) = x 
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Then r(x) = fi (x) + r 2 (x) and r(x) = r\{x) + r 2 (x). Since < Ch 1 , it follows from 

the second inequality in Theorem 1 of Yao [26] that 

r 2 2 ~| / ACM\ 1 ^ 2 
P{|f 1 (x)-Ef 1 (x)|> £ }<8cxp -— q — +44 1 + — ;- 9 2 «([Pi]A[p2];[piP2],JV), 



where g = [(eM) 1 / 2 ^! A iV 2 )], P; = iVi/(2g) and 

, , 2 32g 4 Cpip 2 CMe d CMe C 2 Me 

v{i) <^tr — J— + —T- = h+—r- -—r~ ' 

Iv^ n h Pip 2 n h, h 

where C, C\,C 2 > are constant. The first inequality in the above expression can be 
verified similarly to the variance expression in Proposition 1 of Yao [26] , and the second 
inequality is obvious by setting M = log N and e 2 = 8aC\ogN/(N\ A N 2 ) 2 h for some 
constant a > 0. Now 

f eV 1 f £ 2 (Y! AiV 2 ) 2 M Ar „ 



8^(g) 2 J ~ I 8C 
On the other hand, condition (C4') entails that 

— J 9 2 a(piAp 2 ;[piP2],iV)<cf — J eMiV( £ M)^/ 2 

= 0{ N- f3/4+3/i h- f3/i - 3/i (\ogN) 3p/i+7/i }. 

Let L = [(N/h) 1 /' 2 }. Hence, 



< L{N- a + iV-/ 3 / 4 +3/ 4 / l -/ 3 / 4 -3/ 4 (l giV)3/3/4+7/4| ^ Q> (A n) 

see condition (3.3). On the other hand, 

P{\r 2 (x) - Er 2 (x)| > e} < NP{\Y(b)\ > M} < Y C - AM E{c A l y ( s )l} = 0(N- X+1 ), 
where A > is a constant. Hence 

pj max. \r 2 (x) - Er 2 (x)\ > e\ < 0(LN- x+1 ) -> 0; 



see (3.2). Combining this with (A. 11) and (A. 10), we have 



P< sup \r(x) - Er(x)\ >e}=0 



logiV^ 1/2 
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Now (A. 9) follows from this and the fact that sup^g^ |Er(a;) — r(x) \ = 0(h 2 ), which may 
be established via simple algebraic manipulation. 

To prove (A.8), it follows from (A.7) and Theorem 2 of Yao [26] that for any e > 
there exists a r > such that 

sup|Em(x) - E{m(x)I(\f(x) - f(x)\ < h 2 r)}\ < e. 

Now, for x E A, 

sup|E{a(x)/(|/(x) - f(x)\ < h 2 r)} - m(x)\ 

< sup\E{r(x)/f(x)I(\f(x) - f(x)\ < h 2 r)} - m(x)\ + Ch 2 

< sup\E{r(x)/f(x)}~m{x)\+e + C 1 h 2 <e + C 2 h 2 . 

x£A 

holds uniformly. Hence (A.8) follows from (A.7). □ 
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